Fundamental Effects of Clustering on the Euclidean Embedding of Internet Hosts
نویسندگان
چکیده
The network distance estimation schemes based on Euclidean embedding have been shown to provide reasonably good overall accuracy. While some recent studies have revealed that triangle inequality violations (TIVs) inherent in network distances among Internet hosts fundamentally limit their accuracy, these Euclidean embedding methods are nonetheless appealing and useful for many applications due to their simplicity and scalability. In this paper, we investigate why the Euclidean embedding shows reasonable accuracy despite the prevalence of TIVs, focusing in particular on the effect of clustering among Internet hosts. Through mathematical analysis and experiments, we demonstrate that clustering of Internet hosts reduces the effective dimension of the distances, hence low-dimension Euclidean embedding suffices to produce reasonable accuracy. Our findings also provide us with good guidelines as to how to select landmarks to improve the accuracy, and explains why random selection of a large number of landmarks improves the accuracy.
منابع مشابه
On Dimensionality of Coordinate-Based Network Distance Mapping
In this paper, we investigate the veracity of a basic premise, “that network distance is Euclidean”, assumed in a class of recently proposed techniques that embed Internet hosts in a Euclidean space for the purpose of estimating the delay or “distance” between them. Using the classical scaling method on a number of network distance measurement datasets, we observe “non-Euclidean-ness” in the ne...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملEmbedding normed linear spaces into $C(X)$
It is well known that every (real or complex) normed linear space $L$ is isometrically embeddable into $C(X)$ for some compact Hausdorff space $X$. Here $X$ is the closed unit ball of $L^*$ (the set of all continuous scalar-valued linear mappings on $L$) endowed with the weak$^*$ topology, which is compact by the Banach--Alaoglu theorem. We prove that the compact Hausdorff space $X$ can ...
متن کاملDetecting Overlapping Communities in Social Networks using Deep Learning
In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...
متن کاملBotOnus: an online unsupervised method for Botnet detection
Botnets are recognized as one of the most dangerous threats to the Internet infrastructure. They are used for malicious activities such as launching distributed denial of service attacks, sending spam, and leaking personal information. Existing botnet detection methods produce a number of good ideas, but they are far from complete yet, since most of them cannot detect botnets in an early stage ...
متن کامل